Software Technologies for High-Performance Parallel Signal Processing
نویسندگان
چکیده
ion and Overhead A major goal of PVL is that a reasonably mapped application should be able to achieve performance close to that which can be achieved by native mathematical and message-passing libraries on a given platform. To examine our success in this goal, we measured the amount of overhead PVL adds to an optimized FFT operation. This section describes these performance results in more detail. Understanding the PVL FFT. The use of the PVL FFT can be divided into three phases: declaration, setup, and execution. The distinguishing characteristics of the PVL FFT are that the input vector and the output vector each have a map associated with them that describes how they are distributed, and that, in addition, a computation map is associated with the FFT object that can be different from either the input or the output vector map. The computation map describes how and where the computations take place. Methodology. We measured computation times for out-of-place complex-to-complex vector FFT operations using vectors of length L = 2, n ∈ {8, 9, ..., 14} (i.e., L ranged from 256 to 16,384). These numbers were picked because they reflect a good range of interest for the embedded processing space. Each function of interest was run a given number of iterations and the total time was recorded. The first iteration was not counted because cache effects were likely to be most pronounced the first time the function was called. Measured times considered only the computation portion of the program, and did not include time to create or construct the FFT object, allocate • KEPNER AND LEBAK Software Technologies for High-Performance Parallel Signal Processing 190 LINCOLN LABORATORY JOURNAL VOLUME 14, NUMBER 2, 2003 memory for the vectors, or perform other functions. Computation rates were calculated by assuming 5nL floating-point operations per complex FFT. In all cases we ran the measurement program six times, averaged the time over a thousand iterations each time, and used the minimum of the six results. FIGURE 10. Overhead for PVL fast Fourier transform (FFT) execution time, relative to that of the Fastest Fourier Transform in the West (FFTW) library for vectors of various lengths. In all cases the overhead is less than 2%. FIGURE 11. C++ expression templates. C++ typically requires the use of temporary variables in order to write high-level mathematical expressions. Obtaining high performance from C++ requires technology such as expression templates that eliminate the normal creation of temporary variables in an expression. Performance Results. The PVL FFT object whose performance is measured here used the Fastest Fourier Transform in the West (FFTW) library [5]. Because this library is a collection of self-optimizing FFT routines that has been shown to outperform vendor-optimized libraries in some cases, we can be confident that it represents a well-performing FFT operation. Figure 10 shows the execution time of PVL relative to FFTW. It is clear that PVL adds very little overhead to the underlying FFTW call. Abstraction and Optimizationion and Optimization PVL is implemented by using the C++ programming language, which allows the user to write programs using high-level mathematical constructs such as
منابع مشابه
Implementation of the direction of arrival estimation algorithms by means of GPU-parallel processing in the Kuda environment (Research Article)
Direction-of-arrival (DOA) estimation of audio signals is critical in different areas, including electronic war, sonar, etc. The beamforming methods like Minimum Variance Distortionless Response (MVDR), Delay-and-Sum (DAS), and subspace-based Multiple Signal Classification (MUSIC) are the most known DOA estimation techniques. The mentioned methods have high computational complexity. Hence using...
متن کاملDesign and Implementation of a High Speed Systolic Serial Multiplier and Squarer for Long Unsigned Integer Using VHDL
A systolic serial multiplier for unsigned numbers is presented which operates without zero words inserted between successive data words, outputs the full product and has only one clock cycle latency. The multiplier is based on a modified serial/parallel scheme with two adjacent multiplier cells. Systolic concept is a well-known means of intensive computational task through replication of func...
متن کاملDesign and Implementation of a High Speed Systolic Serial Multiplier and Squarer for Long Unsigned Integer Using VHDL
A systolic serial multiplier for unsigned numbers is presented which operates without zero words inserted between successive data words, outputs the full product and has only one clock cycle latency. 
The multiplier is based on a modified serial/parallel scheme with two adjacent multiplier cells. Systolic concept is a well-known means of intensive computational task through replication of fu...
متن کاملآشکارسازی سیگنال بر اساس پردازش موازی مبتنی بر جیپییو در شبکههای حسگری صوتی دارای زیرساخت
Nowadays, several infrastructure-based low-frequency acoustical sensor networks are employed in different applications to monitor the activity of diverse natural and man-made phenomena, such as avalanches, earthquakes, volcanic eruptions, severe storms, super-sonic aircraft flights, etc. Two signal detection methods are usually implemented in these networks for the purpose of event occurrence i...
متن کاملDesign and Implementation of Digital Demodulator for Frequency Modulated CW Radar (RESEARCH NOTE)
Radar Signal Processing has been an interesting area of research for realization of programmable digital signal processor using VLSI design techniques. Digital Signal Processing (DSP) algorithms have been an integral design methodology for implementation of high speed application specific real-time systems especially for high resolution radar. CORDIC algorithm, in recent times, is turned out to...
متن کاملPerformance of the Wavelet Transform-Neural Network Based Receiver for DPIM in Diffuse Indoor Optical Wireless Links in Presence of Artificial Light Interference
Artificial neural network (ANN) has application in communication engineering in diverse areas such as channel equalization, channel modeling, error control code because of its capability of nonlinear processing, adaptability, and parallel processing. On the other hand, wavelet transform (WT) with both the time and the frequency resolution provides the exact representation of signal in both doma...
متن کامل